NASA— CR-2 00299 




in Image Reconstruction and Restoration , Proc. of the SPIE 2302-23, San Diego, CA (July 1994) 


Parameter dimension of turbulence-induced phase errors 


and its effects on estimation in phase diversity 


Brian J. Thelen and Richard G. Paxman 


Environmental Research Institute of Michigan 
P.0. Box 134001, Ann Arbor, Michigan 48113-4001 
(313) 994-1200 ext. 2363 
internet email: thelen@erim.org 




ABSTRACT 

The method of phase diversity has been used in the context of incoherent imaging to estimate jointly 
an object that is being imaged and phase aberrations induced by atmospheric turbulence. The 
method requires a parametric model for the phase- aberration function. Typically, the parameters 
are coefficients to a finite set of basis functions. Care must be taken in selecting a parameterization 
that properly balances accuracy in the representation of the phase-aberration function with stability 
in the estimates. It is well known that over parameterization can result in unstable estimates. Thus 
a certain amount of model mismatch is often desirable. We derive expressions that quantify the 
bias and variance in object and aberration estimates as a function of parameter dimension. 

1. INTRODUCTION 

The method of phase diversity, first proposed by Gonsalves [1], requires the simultaneous col- 
lection of two images, as depicted in Figure 1. The first is the conventional incoherent focal-plane 
image that is degraded by system aberrations. System aberrations can arise from atmospheric 
turbulence, aberrated optical elements, or misalignments among optical elements. A simple beam 
splitter and a second detector array, translated along the optical axis, constitute a diversity channel 
that affords the collection of the second image, which is further degraded due to defocus. The 
defocus introduces a quadratic phase diversity. More general phase diversities can be introduced 
by reimaging the pupil and inserting a prescribed phase screen in the diversity channel conjugate 
to the pupil. The goal is to identify an object and phase aberrations that are consistent with both 
collected images, given the known phase diversity. 

The method of phase diversity offers several advantages over other aberration-sensing methods. 
The optical hardware is modest. A simple beam splitter and a second detector array affords the 
simultaneous collection of the two images, as illustrated in Figure 1. In addition, the method relies 
on an external reference (the object being imaged) and is therefore less susceptible to systematic 
errors introduced by optical hardware. The technique also works well for extended objects or even 
scenes. The method of phase diversity should not be confused with curvature-sensing methods, 
which have been developed for point objects. 



diversity image 2 



Figure 1: Optical layout for phase-diversity imaging. 

Several authors have investigated the use of phase diversity in a variety of applications, including 
wavefront sensing, imaging with phased-array telescopes, and solar astronomy [1-9]. Researchers 
have also developed variations on the basic phase-diversity concept. One such variation, referred to 
as phase-diverse speckle imaging , requires the collection of a pair of short-exposure diversity images 
for each of several atmospheric realizations [10-13]. This novel imaging modality appears to be 
quite promising for use in ground-based astronomy, particularly solar astronomy. Another variation 
is the use of phase diversity to correct for space- variant blur [14-16]. 

Paxman, et al . , [6] developed the theory of phase diversity within an estimation-theoretic frame- 
work. In this context, the method of phase diversity is accomplished by jointly estimating the object 
and the system phase-aberration function. The method requires a parametric model for the phase- 
aberration function. Typically, the parameters are coefficients to a set of basis functions. Obviously, 
the number of aberration parameters to be estimated must be finite. It is natural to ask how many 
aberration parameters are appropriate. Care must be taken in selecting a parameterization that 
properly balances accuracy in the representation of the phase-aberration function with stability in 
the estimates. It is well known that over parameterization can result in unstable estimates (cf. [17], 
pg 88). Thus a certain amount of model mismatch is often desirable. In this paper we introduce an 
approach for assessing the effects of phase-aberration model mismatch on the joint estimation (of 
object and aberrations) in phase diversity. Specifically we give expressions for the expected value 
and covariance of the maximum-likelihood estimates as a function of the mismatch in the phase 
aberrations, and in turn, these expressions can be used to quantify the mean-squared error (MSE) 
of the object estimate, as a function of the mismatch in the phase- aberrations. We believe that such 
a quantification will provide valuable insight into phase- aberration model selection for the phase 
diversity estimation problem. 



The organization of this paper is as follows. In the next section, we present a brief overview 
of the general model mismatch problem The general problem is that of parameter estimation from 
noisy data for which there is a model mismatch. Specifically, in this general problem, we suppose 
we have two models, I and II, which potentially describe the generation of the data, and we are 
using model I as the basis for the parameter estimation. However we are concerned that model II 
is actually more accurate and that there will be adverse effects on our estimation due to the model 
mismatch. As a concrete example within phase diversity, model II might correspond to a fully 
parameterized phase aberration model, while model I corresponds to a reduced parameterization 
of the phase aberrations. Assuming a maximum-likelihood estimation (MLE) framework, we then 
present an analytic characterization of the estimates (MLE) in the general mismatched problem. 
In two subsequent subsections, this is followed up with applications of the theory to two special 
cases which are of interest in phase diversity. In section 3, we apply the theory specifically to phase 
diversity to derive expressions for the expected value and covariance for the object estimate in the 
case of a mismatched phase aberration model and Gaussian noise. In the Gaussian noise case, these 
expressions can be numerically computed for moderate sized images. Covariance computations in 
the Poisson-noise case are more challenging. The final section summarizes the results of the paper 
and discusses areas for future research. 

2. GENERAL THEORY 

This section describes the general theory for MLE in the case of model mismatch. Specifically, 
we give analytic expressions for the approximate bias and covariance of the MLE in the case of 
model mismatch. We then apply these general results to two special cases. The first case, discussed 
in Subsection 2.1, is that of independent and identically distributed (iid) multivariate Gaussian 
random vectors with known covariance matrix, which is assumed to be a scalar multiple of the 
identity matrix, and a mean vector which is a function of the underlying (unknown) parameter. 
The second case, discussed in Subsection 2.2, is that of iid multivariate Poisson random vectors, 
where the components are independent with a mean vector which is a function of the underlying 
parameter. 

For the general theory we assume we have noisy multivariate data which is modeled as iid 
random vectors Xi, X 2 , . . . ,X n . The random vectors are assumed to have a continuous distribution 
specified by a probability density function (pdf) or a discrete distribution specified by a probability 
mass function (pmf). For the sake of concreteness, we present the theory for the case of a continuous 
distribution (i.e., pdf’s), but the theory carries over easily to the case of discrete distributions. We 
assume that we have two models: model I, the basis for MLE, and model II is the “true” model. 
Under model I, the pdf for X\ is given by f(x\ 9) where 0 6 0 with 0 being a finite-dimensional 
parameter space. Under model II, X\ has the true pdf f(x ; 9*) where 6* E 0* is fixed, but need 
not be finite dimensional. For notational convenience, we use the same functional representation 
for all pdf’s with the parameter notation dictating which is appropriate. Also, let Eg, Eg* denote 
the expectation operators associated with the parameters 9 and 9* respectively. 

In the matched model case, i.e., in the case were model I actually is true, the maximum likelihood 
estimator 9 n , defined by 

n 

e n = argmax e JJ /(X, ; 6), 


( 1 ) 



has very well-known statistical properties provided that the pdf’s are three-times differentiable with 
respect to the parameters and the partial derivatives satisfy certain technical conditions (cf. [18] or 
[19]). Specifically it is known to be an efficient estimator (smallest variance over the whole parameter 
space) which has a distribution well approximated by a multivariate Gaussian distribution with a 
mean vector of 9 and a covariance equal to L times the inverse of the Fisher information matrix. 
Here, the Fisher information matrix is given by 


P(*)]« = E, 


where L n (6) is the log-likelihood function, i.e. 


’dLi(fl) dL\{0) 

d$i dOj 
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£»(«) = £l°g(A*i;fl)). 

3=1 


( 3 ) 


As discussed in the Introduction, we 


are interested in the properties of the MLE 9 n in the 
mismatched case, i.e., where model II is actually true. For this analysis, we assume that model I 
is used to generate the MLE’s of 0 and we assume that the true pdf is /(•;#*) corresponding to 
model II. We also make the assumption that all pdf’s have common support, i.e., the set where 
they are not equal to 0 is the same for all 6 and #*. For fixed 0, and making mild assumptions on 
statistical model I, similar to those standardly assumed in MLE theory (cf. [19]), one can show 
that 9 n converges in probability to 0 o £ 0 as n gets large, where 9 0 satisfies that 


Oo = argmax 0 E^(log(/(Xi; 0)). (4) 

Here we are assuming that the expected value in the RHS of (4) is well-defined and finite for all 
9. For the rest of this paper, this convergence is assumed to hold for the MLE. One can show that 
the maximization in the RHS of (4) is equivalent to determining the value of 9 which minimizes 
the relative entropy (Kullback-Liebler distance) between the pdf /(•;#*) and the pdf /(*;#), i.e., 
the pdf /(•; 9 0 ) is that pdf in {/(•;#) : 9 € 0} which is closest to /(•;$*) in “relative entropy 

distance.” Assuming that 9 n converges in probability to # 0 , and assuming that the pdf’s f(x]9) 
are nice smooth functions of 8 for fixed z, we have a generalization of the usual result for MLE’s. 
Specifically it can be shown that 8 n has a distribution well approximated by a multivariate Gaussian 
distribution with mean 9 0 and a covariance matrix given by 

faVoT'JiiOMfte.))- 1 (5) 


where 


and 
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Note that in the case of a matched model, the J* and J% matrices were actually equal and in fact 
were both equal to the Fisher information matrix. In the mismatched case, these matrices are in 



general not equal, as we will see in the special cases. In the case of a 1-D parameter space 0, the 
proof of the above is very similar to the result stated and proved in [19]. So far in this section, we 
have provided general expressions for the approximate expected value and covariance of the MLE. 
In the two subsequent subsections, we present expressions for the expected value and covariance in 
two special cases of interest for phase diversity (multivariate Gaussian and multivariate Poisson). 

2.1 Application to a Gaussian model 


In this subsection, we specialize the result to the case where Xi, X 2 , . . . , X n are iid M-dimensional 
multivariate Gaussian under both models I and II with a common covariance matrix of a 2 1 where 
I is M x M identity matrix. In model I, we assume that the mean vector is a function of #, i.e., 

/*i(0) 

M2 (0) 

Hm{9) . 

and under model II, we assume that the mean vector is a function of #*, i.e., 



K°) = 


y(0*) = 


I* i(0*) 


L J 


(9) 


Again, though the functions ji i, . . . ,/im are actually different for each of the two models, for the 
sake of notational convenience we do not introduce extra notation to distinguish between the two. 
The log-likelihood function, for model I and one random vector Xi, is given by 

-j M 

= £(*»-/**(*))’■ ( 10 ) 

Z<J m= 1 

By easy computations one can show that 9 0 is the value of 9 satisfying that it minimizes the 
Euclidean norm between the mean vectors (i{9) and /i(0*), i.e., 

M 

9 0 = argmin 0 J2 0^(0*) - M™(0)) 2 , (11) 

m — 1 


and Jl ( 9 0 ) is given by 
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and Jl{e 0 ) is given by 


m 


i M 

i3 = ^E 


m— 1 L 


d Hm{Qo) / \ { Q \\ I 

\l^m\P *) /^m(y 0 )J T 


d9i de 3 


dO i 


99, 


(13) 


2.2 Application to a Poisson model 

In this subsection, we specialize the result to the case where Xi, X 2 , . . . , X n are iid M-dimensional 
multivariate Poisson with independent components under both models I and II. Under model I, we 
assume that the mean vector is a function of 9 , i.e., 
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and under model II, we assume the mean vector is a function of $*, i.e., 


A (9*) = 


Ai(0*) 
Ho') 

L Am(#*) J 


(15) 


Again, though the functions Ai, . . . , A u are actually different for each of the two models, for the 
sake of notational convenience we do not introduce extra notation to distinguish between the two. 
The log-likelihood for model I and one random vector X\ is given by 


M 


H«) = E (V lro log(A m ( 0 )) - H{ 0 ) - log(* lm !)) 


ra= 1 


Now one can show that 9 0 is the value of 9 satisfying that 
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and Jl(9 0 ) is given by 
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and Jl(0 o ) is given by 
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3. APPLICATION TO PHASE DIVERSITY 


In this section, we apply the general theory to the mismatched estimation problem within 
phase diversity. We take as our starting point the general phase diversity imaging model which is 
characterized by the following equation: 

9k(x) = f( x ') $ k(x - x') ( 20 ) 

x'ex 

where / is the object array, s k is the point-spread function (PSF) having diversity k, g k {x) is the 
expected k th diversity image, £ is a two-dimensional coordinate, and 

T = {0,...,A-l}x{0,...,A-l}. (21) 

We treat the object, the PSF’s, and the images as periodic arrays with period cell of size N x N. 
The model for the k th PSF is that 

s k {x) = \h k (x)\ 2 (22) 

where h k is the inverse discrete Fourier transform of the generalized pupil function, 

H k (u) = A k (u) exp {i{</>(u) + 0&(u))}, (23) 

where A k (u) is the binary aperture function, cj) 18 the unknown phase-aberration function that we 
would like to estimate, and 0 k the known phase function associated with the k th diversity image. It 
is typical to parameterize the phase-aberration function: 

<£(«) = ( 24 ) 

3 = 1 

where the Jt coefficients, oq,. . . ,nj t , serve as parameters and is a convenient set of basis 

functions, such as discretized Zernike polynomials. With this parameterization of </>, the PSF 
depends on the parameter vector a and similarly the noiseless image values, g k (x) depend on / 
and a. Since we will be taking derivatives with respect to object and aberration parameters, we 
make this dependence explicit by writing sj^ajjar) and g k (x;f, a). We now consider the case where 
the noise at each detector element is modeled as iid zero-mean Gaussian with a variance of a 2 . As 
discussed earlier, for a fixed finite set of basis functions, there is a question of how many to include 
in the model, i.e., how many aberration coefficients to estimate. One wants the dimensionality to be 
large enough to provide an accurate approximation, but not so many so as to cause instability of the 
aberration and/or object estimates. We can apply the theory developed in the previous subsection 
2.1 to give analytic expressions for the effects of the mismatched phase-aberration model. Our true 
model in this setting is that the parameter 6 * € 0* consists of 

0* = ({/*(a:) : x G A}, £ 0* = R* x R Jt (25) 

and the mismatched model corresponds to 

e = ({/(*) : x S X}, totf) e 0 = R+ x R 7 ' 


( 26 ) 



where J < J t , R denotes the real numbers, and R* denotes the space of all possible nonnegative 
objects. Notationally let the “true” object be denoted by /* and the “true” aberrations be denoted 
by the Jf-dimensional vector a*. For moderate to high signal-to-noise ratios, the current stochastic 
model is statistically equivalent to a model where one observes an iid sequence of n white noisy 
images, each which is multivariate Gaussian with the same mean and a variance of na 2 . Hence in 
this case, our previous theory can be applied to this problem to assess effects of model mismatch 
on MLE. 

The first step in this application is to determine the expected value for / and . Based on 
the results from subsection 2.1, it is easy to show that this corresponds to the values of / and 
a ( J ) = {aj}{ which minimize the sum of squared differences between the two sets of (noiseless) 
diversity images, he., the object f Q and phase parameters ay) satisfying 

(/o,«i J) ) = argmin f ^ {J) ^ \g k (x; /*,a*) - g k (x; | 2 . (27) 

k= i xex 

Note that this is essentially the standard MLE estimation problem for / and aberrations 
where one assumes that the data are the noiseless diversity images generated by /* and a * (cf. 
[6]). Therefore existing nonlinear optimization methods that are used for phase diversity estimates 
can be used here to find the expected values. From these expressions for the MLE expected value 
we can compute the bias for the mismatched MLE of the object. Specifically the bias is given by 
jf 0 (u) — /*(«), where f 0 was determined from solving for the MLE. 

The second step is to compute the covariance which depends on To do this we need 

to compute the matrices J*(/ 0 ,a( J )) and and apply the formula in (5). For notational 

convenience, we replace f 0 by / and a ^ by a for the rest of this section. The expressions for these 
matrices involves the first and second partial derivatives of the mean data vector, which in this case 
corresponds to the noiseless diversity images, with respect to the parameters. Using (20), it can be 
shown that 
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Similarly the J 2 matrix is given by 
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The computation of the covariance matrix for the mismatched object MLE involves computing the 
upper (/, /) quadrant of the total covariance matrix 




(36) 


Now the matrices J* and J 2 are square matrices whose dimension is the sum of the object dimension 
plus the aberration dimension, which for typical real problems is quite large. This presents a 
potential problem in computing the covariance matrix in (36), due to the fact that we have to do 



an inversion of the J l matrix. However if one carefully inspects the above expressions, especially 
(32), after lexicographic ordering the parameter f(u) with respect to the two-dimensional variable 
u, the upper lefthand (/, /) quadrant of J* matrix is block circulant and so one can easily find 
the inverse of this quadrant using the FFT. If the dimension of the aberration parameterization 
is relatively small, one can use this and the formula for inverses of partitioned matrices (cf. pg 
390-391 in [17]) to derive expressions for the inverse of J* matrix relatively easily. It only requires 
the multiplication of matrices and the inversion of matrices whose dimension is the dimension of the 
phase aberration parameterization. Thus in this case, the computation of the bias and covariance 
of the object MLE appears reasonable, and we are currently implementing this in software to do 
computations on simulated data. 

Using the results in subsection 2.2, we have also derived analogous expressions for the (mis- 
matched) MLE expected value and covariance in the phase diversity joint estimation problem for 
the case that Poisson noise is dominant. Again the expected value is computed by solving the MLE 
problem with the noiseless diversity images as data. However the nice block-circulant structure 
which was present in the J* matrix for the Gaussian noise case is no longer present in the Poisson 
noise case, so that the numerical computation of the covariance matrix appears to be considerably 
more difficult than was true in the Gaussian case. Alternative computational approaches in the 
Poisson case, which exploit other special structures of the matrices is the object of current research. 

4. SUMMARY 

In this paper, we have presented an approach for assessing the affects of model mismatch on 
joint estimation in phase diversity. We have given expressions for the expected value and the co- 
variance of the MLE in the case of a mismatched phase aberration model and Gaussian noise. The 
expected value can be computed by solving a phase diversity estimation problem using noiseless 
diversity images as data. The covariance can be computed by inverting matrices whose dimension 
is that of the total dimension of all the parameters (object and aberration) and then doing matrix 
multiplication. In the case of moderate size objects, a brute force matrix inversion may be numer- 
ically impractical. However in the Gaussian case, invoking some special matrix structures which 
are present, we presented a reasonable numerical approach for computing the inverse and hence for 
computing the covariance matrix. Currently, we have no analogous simplification which holds in 
the Poisson case. This overall approach to assessing model mismatch within phase diversity carries 
over directly to the case of phase diverse speckle and we have worked out the details. 

Areas for future research include: 

(i) Application of the approach to simulated data for the Gaussian noise case. In particular, 
apply the methodology to quantify MLE performance as a function of different aberration param- 
eterizations for a variety of SNR’s, objects, and phase aberrations. 

(ii) In the Poisson case, investigating the existence of special matrix structures (within J*) 
which would make the numerical computation of the covariance matrix practical for moderate size 
objects. 
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